Revisiting Success in Music Streaming: A Data-Driven Predictive Approach

 

Juan D Montoro-Pons & Manuel Cuadrado-García

Universitat de València

María Luisa Palma-Martos

Universidad de Sevilla

Highlights

 

  • Incorporate cultural practices (via unstructured data sources) into empirical models to enrich the representation of actors in the cultural sectors
  • Include non-parametric/flexible statistical learning models in the toolbox of quantitative cultural economics
  • Assess empirical models based on out-of-sample performance providing robustness to findings (does the model perform on unseen data?)
  • Use of alternative tools to empirically interpret the contribution of track/artist-related features into chart performace

Ultimate goal is to identify what features have an impact on success in the streaming economy and learn causal effects using flexible machine learning tools

Commercial success in the streaming economy

 

Performance in streaming charts has been explored extensively from a managerial/economic standpoint as well as through broader interdisciplinary lenses. Survival in charts has been found to be associated to being signed to a major label and level of competition (Kaimann, Tanneberg, and Cox 2021), the influence of media exposure and electronic word of mouth (Lee and Kim 2024), or early adoption of new genres/subgenres (Sobchuk, Youngblood, and Morin 2024).

Individual sonic features of songs (both objective and perceptual) have been proposed as predictors of success: Askin and Mauskapf (2017) introduce a metric of tipicality of a song based on its features (and suggest a similarity/differentiation tradeoff). (See also Interiano et al. 2018; Saragih 2023)

Artists’ collaborations has been a specific topic to explain streaming performance: the impact of joint efforts on commercial outcomes or the benefits of increasing the disimilarity between artists in collaborations have been both analyzed from a causal standpoint (Ordanini, Nunes, and Nanni 2018; McKenzie, Crosby, and Lenten 2021).

Collaboration networks allow researchers to use artist association topologies as predictors (Kang, Mandulak, and Szymanski 2022) and identify popularity bias in centrality measures , i.e., how fame skews network importance metrics (South, Roughan, and Mitchell 2020).

The dataset

 

  • The primary dataset is retrieved from Spotify’s global weekly chart, covering the period from 29/09/2013 to 23/01/2025.

  • The sampling unit is a track (song)

  • Using webscraping and the Spotify API we collect information about a track’s success (peak position on charts, weeks at peak position, maximum weekly streams, total streams, and a popularity index) and a set of track and artist features including information about genres, type of album, release date, artist(s) popularity and followers, whether the track is a collaboration, markets in which the album is present, and audio features of the track to mention some. Features can be classified as

    • Track-specific
    • Album-related
    • Artist(s)-related (e.g., online tags from LastFM and MusicBrainz)
  • The dataset includes information on 4153 tracks by 1670 unique artists.

Descriptive analysis

Describing the dataset: tracks

 

variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
weeks 0 1.000 18.745 33.810 1.000 1.000 6.000 22.000 407.000 ▇▁▁▁▁
top_ten 3680 0.114 8.951 8.917 1.000 2.000 6.000 14.000 64.000 ▇▂▁▁▁
streams 0 1.000 155.566 337.316 0.174 8.262 32.860 146.228 4604.759 ▇▁▁▁▁
peek_position 0 1.000 80.513 59.874 1.000 27.000 69.000 127.000 240.000 ▇▅▃▃▁
track_popularity 0 1.000 54.469 23.105 0.000 50.000 60.000 69.000 90.000 ▂▁▃▇▃
duration_s 0 1.000 208.258 49.196 35.240 178.107 203.641 230.953 613.027 ▁▇▁▁▁
loudness 0 1.000 -6.305 2.491 -34.475 -7.497 -5.899 -4.663 1.509 ▁▁▁▇▇
acousticness 0 1.000 0.216 0.235 0.000 0.037 0.125 0.320 0.994 ▇▂▁▁▁
instrumentalness 0 1.000 0.013 0.075 0.000 0.000 0.000 0.000 0.927 ▇▁▁▁▁
speechiness 0 1.000 0.126 0.118 0.023 0.045 0.074 0.172 0.966 ▇▂▁▁▁
danceability 0 1.000 0.681 0.142 0.150 0.590 0.695 0.786 0.985 ▁▂▅▇▃
energy 0 1.000 0.642 0.166 0.028 0.537 0.657 0.766 0.989 ▁▂▆▇▃
valence 0 1.000 0.484 0.222 0.032 0.313 0.481 0.656 0.976 ▃▇▇▆▃

Describing the dataset: artists and collaborations

 

  • The dataset includes information on 1670 unique artists.

  • Of all the tracks, 45% are collaborations between artists while 55% are solo tracks

  • Artist roles is split between:

    • solo (34% of occurrences)
    • lead (27%)
    • feature (39%).
  • Most frequent collaborations are between two artists: mean, median and Q1 of number of collaborators are 2.89, 2 and 3 respectively.

artist_name tracks collab_ratio
Drake 172 61%
Bad Bunny 110 57.3%
Taylor Swift 98 13.3%
Travis Scott 84 73.8%
Future 73 76.7%
Ariana Grande 66 40.9%
21 Savage 63 76.2%
The Weeknd 61 44.3%
Ed Sheeran 56 50%
Lil Baby 56 78.6%

Describing the dataset: more on collaborations

 

  • Spotify API produces an index of popularity for artists. Furthermore it provides the number of followers in the platform.

  • For each collaboration we compute:

    • The joint popularity
    • The popularity_ratio of each artist
    • The joint number of followers
    • The followers_ratio of each artist

Asymmetric collaborations

 

measure Role: feature Role: lead
% of artists that are more popular 0.46 0.54
% of artists that have more followers 0.39 0.55
Average popularity ratio 0.99 1.02
Average followers ratio 0.89 1.17
Median popularity ratio 0.99 1.01
Median followers ratio 0.71 1.12

 

Outcome Collab FALSE Collab TRUE
popularity_track 55.61 53.28
weeks_in_lists 19.60 17.11
top_ten 8.60 9.17
peek 80.07 82.14
times_peek 3.19 3.69
streams 163.56 139.72

Engineering rich artist representations

Collaboration networks

 

label degree eigen closeness betweeness
3TVXtAsR1Inumwj472S9r4 Drake 132 1.0000000 0.0003401 31930.107
0Y5tJX1MQlPlqiwlOH1tJY Travis Scott 109 0.7049464 0.0003357 17987.294
4q3ewBCX7sLwd24euuV69X Bad Bunny 93 0.0559605 0.0003233 35985.381
1vyhD5VmyZ7KMfW5gqLgo5 J Balvin 84 0.0500565 0.0003630 53044.032
2R21vXR83lH98kGeO99Y66 Anuel AA 81 0.0346626 0.0003212 20480.496
1RyvyyTE3xzB2ZywiAwp0i Future 79 0.7645781 0.0003185 14613.223
50co4Is1HCEo8bhOyUWKpn Young Thug 78 0.6576292 0.0003368 11283.802
1i8SpTcr7yvPOmcqrbnVXY Ozuna 76 0.0321060 0.0003227 14830.585
77ziqFxp5gaInVrF2lj4ht Sech 73 0.0294024 0.0002987 3370.446
1URnnhqYAYcrqrcwql10ft 21 Savage 67 0.7061541 0.0003232 8718.683
5f7VJjfbwm532GiveGC0ZK Lil Baby 66 0.4302981 0.0003224 13987.667
1pQWsZQehhS4wavwh7Fnxd Lenny Tavárez 62 0.0211659 0.0002734 1332.630
0KPX4Ucy9dk82uj4GpKesn Dalex 61 0.0206827 0.0002710 1275.171
1SupJlEpv7RS2tPNRaHViT Nicky Jam 61 0.0256472 0.0003006 6835.123
329e4yvIujISKGKz1BZZbO Farruko 61 0.0247795 0.0003042 10499.564

Tags: LastFM

Tags: MusicBrainz

Tag mapping

 

  1. Started with a dataset containing a list of music artists.

  2. Tag collection via LastFM API

    • Retrieved user-generated tags for each artist using the LastFM API.
    • Tags reflect how listeners perceive and describe each artist (genre-related such as “rock”, “indie”, “electronic”, but also unrelated such as “best of 2024”, “seen live”, or “worst ive heard today”).
  3. Building the artist-tag matrix

    • Constructed a Document-Term Matrix (DTM):
      • A DTM is a way to represent text data (or in this case, tag data) numerically
      • Think of it as a large table: Rows represent artists; Columns represent tags.
      • Each cell shows how strongly a tag is associated with an artist (e.g., frequency or weight).
    • Once we have the DTM, we can compute similarity between artists:
    • Cosine Similarity: Measures how similar two tag vectors are, regardless of their magnitude.
    • Other transformations (e.g., TF-IDF) can refine the matrix for better comparisons.

Cosine similarity: example

DTM engineered features

 

  1. Pairwise similarity betwen artists (i.e., tag vectors) in a collaboration (specific for tracks that are collaborations).

  2. Mean similarity to other artists: how generally “close” a tag vector is to the rest.

  3. Similarity entropy of an artist with all other artists: measures how evenly similar artist \(i\) is to all other artists, how “focused” or “distributed” the similarity of a tag vector is. High entropy means an artist is similarly related to many artists (a genre-blender or mainstream artist). Low entropy means an artist is strongly similar to only a few artists (niche or highly distinctive artist).

  4. Distance of a tag vector to centroid (cosine-based): interpret each artist as a vector in the similarity space and compute its distance to the centroid of all artists (Generic, central, similar to many others vs. distinctive, specific, potentially outlier).

To sum up

Variable Description Type
track_popularity Popularity score of the track (0–100) from Spotify Y
top_ten Indicator whether the track reached Top 10 Y
peek_position Highest chart position reached Y
times_at_peek Number of times the track was at its peak position Y
peek_streams Number of streams during the peak week Y
streams Total number of streams Y
track_id Unique identifier of the track X
explicit Whether the track contains explicit content X
album_type Type of album (e.g., single, album) X
release_date Date when the track was released X
disc_number Disc number (in case of multi-disc albums) X
total_tracks Total number of tracks in the album X
track_number Position of the track within the album X
number_markets Number of markets where the track is available X
markets List of countries where the track is available X
collaboration Whether the track is a collaboration X
number_collaborators Number of artists involved in the track X
role Role of the artist (lead, feature, solo) X
artist_genres List of genres associated with the artist X
followers Number of followers the artist has X
artist_popularity Popularity score of the artist (0–100) from Spotify X
weeks Number of weeks in the ranking or tracking period X
time_signature Musical time signature (e.g., 4/4, 3/4) X
track_name Name of the track X
duration_ms Duration of the track in milliseconds X
danceability How suitable a track is for dancing (0–1) X
energy Energy level of the track (0–1) X
key Key of the song (0=C, 1=C#/Db, …, 11=B) X
loudness Average loudness in decibels X
mode Musical mode: major (1) or minor (0) X
speechiness Degree of spoken words in the track (0–1) X
acousticness Confidence that the track is acoustic (0–1) X
instrumentalness Likelihood that the track has no vocals (0–1) X
liveness Probability the track was recorded live (0–1) X
valence Musical positiveness or mood (0–1) X
tempo Beats per minute (BPM) of the track X
year Release year X
month Release month X
ratio_pop artist_popularity / if collab sum_popularity_artists_in_collab otherwise 1 X
ratio_followers followers /if collab sum_followers_artists_in_collab otherwise 1 X
more_popular Logical: artist_popularity > mean_collab_popularity X
more_followers Logical: followers > mean_collab_followers X
centrality_measures graph-related centrality measures X
distance_measures DTM-related centrality measures X

Predicting success with classifiers

Modeling strategy

 

  • We choose three different categorical responses. Specifically:

    • track is in the top_ten
    • track is in the first quartile of streams distribution
    • track is in the top 50% of the streams distribution
  • We fit different statistical learning models to the dataset. These are:

    • Logistic regression (lr, with and without regularizacion)
    • Two ensembles: a random forest (rf) and an extreme gradient boosting classifier (xgb)
    • Two support vector machines (linear and non-linear): lsvc and svc
    • Neural network (nn)
  • Performance of all models is assessed using a holdout sample through 5-fold cross-validation

  • Regularization parameters are selected via cross-validation (using the f1 score as the metric of predictive performance)

Best models for each response will be further analyzed

Estimation results

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.626 0.270 0.600 0.175
rf 0.819 0.326 0.389 0.285
xgb 0.885 0.138 0.081 0.526
xgb2 0.788 0.316 0.437 0.251
svc 0.744 0.304 0.496 0.221
lsvc 0.649 0.289 0.631 0.188
nn 0.885 0.020 0.010 0.233

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.573 0.420 0.613 0.322
rf 0.620 0.450 0.623 0.353
xgb 0.737 0.287 0.213 0.447
xgb2 0.559 0.453 0.732 0.328
svc 0.606 0.457 0.666 0.349
lsvc 0.624 0.443 0.599 0.352
nn 0.738 0.242 0.171 0.431

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.607 0.608 0.612 0.606
rf 0.669 0.679 0.700 0.660
xgb 0.664 0.669 0.678 0.660
xgb2 0.624 0.700 0.880 0.582
svc 0.659 0.665 0.678 0.654
lsvc 0.633 0.629 0.622 0.636
nn 0.634 0.653 0.691 0.624

Interpretability of black-box models

Shapley values

 

  • Measure each variable’s individual contribution to a model’s prediction.

  • Shapley values produce local explanations (individual predictions) but can be interpreted globally

  • The working: For each prediction of the model:

    1. Consider a baseline prediction (the average)
    2. For every variable \(i\) and every possible combination of features \(J\), \(i \not\in J\)
    • Obtain the prediction of the model including features \(J\) with and without feature \(i\): the difference between these two predictions is the contribution of \(i\)
    • Average the contributions computed in the previous step (contribution of \(i\) for all possible subsets \(J\))
  • The method relies on Monte Carlo sampling to approximate Shapley values by randomly sampling subsets of features rather than evaluating them all.

Shapley values: top ten

Shapley values: top quartile

Shapley values: top 50%

Summary: impact of individual variables on model predictions

 

Feature Top_ten Top_25 Top_50
Entropy (+) (+) (+-)
Centroid similarity (+) (-) (-)
Mean similarity (+) (+) (+)
Betweenness (+) (+) (+-)
Eigenvalue centrality (+-) (+-) (+-)
Closeness (+-) (+-) (+-)
Similarity (pairwise) (-) (+-) (+-)
Popularity ratio (+-) (-) (+-)
Followers ratio (-) (-) (+-)
Track characteristics Pop(+), hip-hop(-), duration (+), energy (-) Speechiness (-), energy(-), danceability(+), explicit (-), number collaborators (-) Explicit (-), loudness (+), danceability (+), speechiness (-), energy (-), duration(+-)

Discussion

Findings

 

The work has generated insights about success in streaming music. Particularly:

  1. We refined the representation of artists using two (unstructured) data sources:
    • As nodes in a graph allowing to incorporate metrics of connections between them
    • As mappings to user-defined labels that allows to introduce measures of similarity/distance that go beyond curated lists of genres (that might convey different meaning to different users)
  2. Based on the Shapley values for the best models it is found that:
    • Graph centrality metrics are weakly associated to charts performance, particularly betweenness, which measures the function of nodes (artists) that facilitate the flow of information within the network
    • Similarity-derived metrics are consistently strongly associated to the outcome: in most cases, high values are related to success. Higher values for entropy, centroid or mean similarity increase predicted probability of success
    • Regarding collaborations, we find some evidence of pairwise similarity affecting success negatively: the more different two artists collaborating the more the probability of success
    • Furthermore there is some evidence of the asymmetric benefits of collaborations: being lead in an asymmetric collaboration (having less followers than the featured artist) is related to a greater probability of success.

Concluding remarks

 

  • Incorporating flexible (non-parametric) approaches to predict sucess in the streaming economy brought about gains in models performance

  • The integration of unstructured data into quantitative research broadens the methodological toolkit of researchers in cultural economics, allowing for a richer and more nuanced representation of complex problems

  • Of course, predictive performance and empirical associations are not causal relations. However, these can be seen as a first step to estimate structural coefficients using machine learning techniques:

    • Double/debiased machine learning
    • Metalearners for heterogeneous treatment effects (HTEs)
    • Other techniques for HTEs (e.g.causal forests)

Thanks!

Appendix: models excluding network metrics

Training results

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.613 0.241 0.539 0.156
rf 0.767 0.302 0.446 0.229
xgb 0.880 0.089 0.051 0.340
xgb2 0.724 0.295 0.511 0.209
svc 0.690 0.258 0.479 0.178
lsvc 0.627 0.263 0.589 0.170
nn 0.881 0.018 0.010 0.071

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.569 0.418 0.615 0.318
rf 0.657 0.448 0.560 0.375
xgb 0.733 0.267 0.195 0.427
xgb2 0.593 0.447 0.663 0.338
svc 0.627 0.450 0.611 0.356
lsvc 0.621 0.441 0.600 0.350
nn 0.737 0.154 0.103 0.431

 

Model mean_test_accuracy mean_test_f1 mean_test_recall mean_test_precision
lr 0.569 0.418 0.615 0.318
rf 0.657 0.448 0.560 0.375
xgb 0.733 0.267 0.195 0.427
xgb2 0.593 0.447 0.663 0.338
svc 0.627 0.450 0.611 0.356
lsvc 0.621 0.441 0.600 0.350
nn 0.737 0.154 0.103 0.431

Shapley values

References

 

Askin, Noah, and Michael Mauskapf. 2017. “What Makes Popular Culture Popular? Product Features and Optimal Differentiation in Music.” American Sociological Review 82 (5): 910–44.
Im, Hyunsuk, Haeyeop Song, and Jaemin Jung. 2018. “A Survival Analysis of Songs on Digital Music Platform.” Telematics and Informatics 35 (6): 1675–86. https://doi.org/https://doi.org/10.1016/j.tele.2018.04.013.
Interiano, Myra, Kamyar Kazemi, Lijia Wang, Jienian Yang, Zhaoxia Yu, and Natalia L. Komarova. 2018. “Musical Trends and Predictability of Success in Contemporary Songs in and Out of the Top Charts.” Royal Society Open Science 5 (5): 171274. https://doi.org/10.1098/rsos.171274.
Kaimann, Daniel, Ilka Tanneberg, and Joe Cox. 2021. “I Will Survive: Online Streaming and the Chart Survival of Music Tracks.” Managerial and Decision Economics 42 (1): 3–20. https://doi.org/10.1002/mde.3226.
Kang, Inwon, Michael Mandulak, and Boleslaw K. Szymanski. 2022. “Analyzing and Predicting Success of Professional Musicians.” Scientific Reports 12 (1): 21838. https://doi.org/10.1038/s41598-022-25430-9.
Lee, Myounggu, and Hye-jin Kim. 2024. “Exploring Determinants of Digital Music Success in South Korea.” Electronic Commerce Research 24 (3): 1659–80.
McKenzie, Jordi, Paul Crosby, and Liam J. A. Lenten. 2021. “It Takes Two, Baby! Feature Artist Collaborations and Streaming Demand for Music.” Journal of Cultural Economics 45 (3): 385–408. https://doi.org/10.1007/s10824-020-09396-y.
Ordanini, Andrea, Joseph C. Nunes, and Anastasia Nanni. 2018. “The Featuring Phenomenon in Music: How Combining Artists of Different Genres Increases a Song’s Popularity.” Marketing Letters 29 (4): 485–99. https://doi.org/10.1007/s11002-018-9476-3.
Saragih, Harriman Samuel. 2023. “Predicting Song Popularity Based on Spotify’s Audio Features: Insights from the Indonesian Streaming Users.” Journal of Management Analytics 10 (4): 693–709. https://doi.org/10.1080/23270012.2023.2239824.
Sobchuk, Oleg, Mason Youngblood, and Olivier Morin. 2024. “First-Mover Advantage in Music.” EPJ Data Science 13 (37): 1–12. https://doi.org/10.1140/epjds/s13688-024-00476-z.
South, Tobin, Matthew Roughan, and Lewis Mitchell. 2020. “Popularity and Centrality in Spotify Networks: Critical Transitions in Eigenvector Centrality.” Journal of Complex Networks 8 (6): cnaa050.